Indexing the Earth Mover's Distance Using Normal Distributions
نویسندگان
چکیده
Querying uncertain data sets (represented as probability distributions) presents many challenges due to the large amount of data involved and the difficulties comparing uncertainty between distributions. The Earth Mover’s Distance (EMD) has increasingly been employed to compare uncertain data due to its ability to effectively capture the differences between two distributions. Computing the EMD entails finding a solution to the transportation problem, which is computationally intensive. In this paper, we propose a new lower bound to the EMD and an index structure to significantly improve the performance of EMD based K– nearest neighbor (K–NN) queries on uncertain databases. We propose a new lower bound to the EMD that approximates the EMD on a projection vector. Each distribution is projected onto a vector and approximated by a normal distribution, as well as an accompanying error term. We then represent each normal as a point in a Hough transformed space. We then use the concept of stochastic dominance to implement an efficient index structure in the transformed space. We show that our method significantly decreases K– NN query time on uncertain databases. The index structure also scales well with database cardinality. It is well suited for heterogeneous data sets, helping to keep EMD based queries tractable as uncertain data sets become larger and more complex.
منابع مشابه
Earth Mover's Distance and Equivalent Metrics for Spaces with Semigroups
introduce a multi-scale metric on a space equipped with a diffusion semigroup. We prove, under some technical conditions, that the norm dual to the space of Lipschitz functions with respect to this metric is equivalent to two other norms, one of which is a weighted sum of the averages at each scale, and one of which is a weighted sum of the difference of averages across scales. The notion of 's...
متن کاملIndexing Earth Mover's Distance over Network Metrics
The Earth Mover’s Distance (EMD) is a well-known distance metric for data represented as probability distributions over a predefined feature space. Supporting EMD-based similarity search has attracted intensive research effort. Despite the plethora of literature, most existing solutions are optimized for Lp feature spaces (e.g., Euclidean space); while in a spectrum of applications, the relatio...
متن کاملThe Earth Mover's Distance is the Mallows Distance: Some Insights from Statistics
The Earth Mover’s distance was first introduced as a purely empirical way to measure texture and color similarities. We show that it has a rigorous probabilistic interpretation and is conceptually equivalent to the Mallows distance on probability distributions. The two distances are exactly the same when applied to probability distributions, but behave differently when applied to unnormalized d...
متن کاملEarth Mover's Distance Minimization for Unsupervised Bilingual Lexicon Induction
Cross-lingual natural language processing hinges on the premise that there exists invariance across languages. At the word level, researchers have identified such invariance in the word embedding semantic spaces of different languages. However, in order to connect the separate spaces, cross-lingual supervision encoded in parallel data is typically required. In this paper, we attempt to establis...
متن کاملSupervised Earth Mover's Distance Learning and Its Computer Vision Applications
Earth Mover’s Distance (EMD) is an intuitive and natural distance metric for comparing two histograms or probability distributions. We propose to jointly optimize the ground distance matrix and the EMD flow-network based on partial ordering of histogram distances in an optimization framework. Two applications in computer vision are used to demonstrate the effectiveness of the algorithm: firstly...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 5 شماره
صفحات -
تاریخ انتشار 2011